Word-Order Issues in English-to-Urdu Statistical Machine Translation
نویسندگان
چکیده
We investigate phrase-based statistical machine translation between English and Urdu, two Indo-European languages that differ significantly in their word-order preferences. Reordering of words and phrases is thus a necessary part of the translation process. While local reordering is modeled nicely by phrase-based systems, long-distance reordering is known to be a hard problem. We perform experiments using the Moses SMT system and discuss reordering models available in Moses. We then present our novel, Urdu-aware, yet generalizable approach based on reordering phrases in syntactic parse tree of the source English sentence. Our technique significantly improves quality of English-Urdu translation with Moses, both in terms of BLEU score and of subjective human judgments.
منابع مشابه
Development of Parallel Corpus and English to Urdu Statistical Machine Translation
In this paper we share the efforts for development of a parallel corpus for statistical machine translation for English text into Urdu. There are certain issues faced during this effort, which are shared and discussed.
متن کاملModel for English-Urdu Statistical Machine Translation
There are above 60 million first language speakers of Urdu and above 104 million second language speakers. Lot of knowledge on the internet available/useful to these speakers of Urdu is in English. The contrast in typology of both languages is interesting to study for Statistical Machine Translation. However, there is almost no parallel aligned data available freely for the selected language pa...
متن کاملA Word Reordering Model for Improved Machine Translation
Preordering of source side sentences has proved to be useful in improving statistical machine translation. Most work has used a parser in the source language along with rules to map the source language word order into the target language word order. The requirement to have a source language parser is a major drawback, which we seek to overcome in this paper. Instead of using a parser and then u...
متن کاملUrdu and Hindi: Translation and sharing of linguistic resources
Hindi and Urdu share a common phonology, morphology and grammar but are written in different scripts. In addition, the vocabularies have also diverged significantly especially in the written form. In this paper we show that we can get reasonable quality translations (we estimated the Translation Error rate at 18%) between the two languages even in absence of a parallel corpus. Linguistic resour...
متن کاملDeveloping English-Urdu Machine Translation Via Hindi
The paper presents a strategy for deriving English to Urdu translation using English to Hindi MT system. The English-Hindi lexical database is used to collect all possible Hindi words and phrases. These are further augmented by including their morphological variations and attaching all possible postpositions. This list is used to provide mapping from Hindi to Urdu. There may be change in gender...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Prague Bull. Math. Linguistics
دوره 95 شماره
صفحات -
تاریخ انتشار 2011